Final Project STAT 331

Author

Emma Turilli, John Ieng, Nat Sakamoto, Gabby Apsay

Reproducibility

All code, raw data, and project files are available in our GitHub Repository. Feel free to explore or replicate our analysis!

1 Project Proposal + Data

This analysis utilizes the life expectancy and the gross domestic product (GDP) datasets sourced Gapminder, a non-profit organization whose mission “is to fight devastating ignorance with a fact-based world view everyone could understand.” Their site provides data sets collected from many reputable sources and interactive visualizations on important world topics.

1.1 Data Cleaning

In the raw GDP dataset, some values included a “k” suffix to represent thousands of dollars (e.g., 10,000 to 10k). The first step is to figure out a way to convert GDP values into numeric form. To keep values constant, we created a function that converts these abbreviated values into their full numeric form, allowing for accurate numeric comparisons. Without this step, any observations containing a “k” would be dropped, leaving it empty and could potentially affecting later analysis.

1.2 Pivoting Longer

The life expectancy data contains information about the life expectancy for 196 countries from the year 1800 to 2100. It provides the life expectancy in years for each country within the set. For the period from 1800 to 1970, the data was sourced from Gapminder’s main source v7: by Mattias Lindgren. Data for 1950-2019 was from the Global Burden of Disease Study 2019, which has 1950-2019 from the IHME. For 2020-2100, Gapminder used UN forecasts from the World Population Prospects 2022.

Life Expectancy Info from: https://www.gapminder.org/data/documentation/gd004

The GDP data was obtained from the Madison Project Database (MPD) and Penn World Table (PWT). This data set contains information on gross domestic product (GDP) per person adjusted for differences in purchasing power in international dollars, and fixed 2017 prices. GDP per capita measures the value of everything a country produces during a year, divided by the number of people. We transformed the data to have columns containing the country, year, and GDP of interest.

GDP Info from: https://www.gapminder.org/data/documentation/gd001/

We transformed each of the individual year columns into one singular column so that the dataset would be easier to read. As a result, each observation consists of one country and year, with the corresponding life expectancy. The raw GDP data is similar to the life expectancy data in that each year has its own column. So we transformed the data in a similar way, making year its own column with its corresponding GDP.

1.3 Joining Datasets

After cleaning up each data set, we had to join the two together by our observational unit, country. We hypothesize that as GDP increases, life expectancy will also begin to increase, as a higher GDP correlates to better infrastructure and more/better access to healthcare and medicine.

2 Linear Regressions

2.1 Data Visualization

2.2 Linear Regression


Call:
lm(formula = life_expectancy ~ avg_gdp, data = gdp_lex_mean)

Residuals:
    Min      1Q  Median      3Q     Max 
-59.329 -19.294  -2.221  20.228  40.524 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4.809e+01  1.306e-01  368.26   <2e-16 ***
avg_gdp     4.073e-04  7.198e-06   56.58   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.87 on 57343 degrees of freedom
Multiple R-squared:  0.05288,   Adjusted R-squared:  0.05286 
F-statistic:  3202 on 1 and 57343 DF,  p-value: < 2.2e-16
Regression Model Estimates by Continent
Average Life Expectancy vs Average GDP
Continent
Model Estimates
Estimate Std. Error t-Statistic p-Value1
Asia
(Intercept) 49.612363 0.967254 51.291992 0.000000
avg_gdp 0.000151 0.000054 2.826570 0.007052
Africa
(Intercept) 45.545427 0.468580 97.198742 0.000000
avg_gdp 0.000608 0.000076 7.955113 0.000000
Europe
(Intercept) 53.359873 0.980483 54.422023 0.000000
avg_gdp 0.000296 0.000031 9.637500 0.000000
South America
(Intercept) 55.974115 1.968199 28.439260 0.000000
avg_gdp −0.000080 0.000135 −0.594425 0.565433
North America
(Intercept) 48.092682 1.902205 25.282599 0.000000
avg_gdp 0.000591 0.000114 5.162279 0.000041
Oceania
(Intercept) 47.159429 2.647934 17.809898 0.000000
avg_gdp 0.000724 0.000177 4.099796 0.001473
1 P-values below 0.05 indicate statistical significance.

\[ \hat{y} = 49.6 + 0.000151x \]

  • Intercept: When the average GDP of Asia is $0, the average life expectancy in Asia is 49.6 years.

  • Slope: For each additional $1 increase in average GDP in Asia, the life expectancy of a person in Asia will increase by 0.000151 years.

\[ \hat{y} = 45.5 + 0.000608x \]

  • Intercept: When the average GDP of Africa is \(0\) dollars, the average life expectancy in Africa is \(45.5\) years.

  • Slope: For each additional \(1\) dollar increase in average GDP in Africa, the life expectancy of a person in Africa will increase by \(0.000608\) years.

\[ \hat{y} = 53.4 + 0.000296x \]

  • Intercept: When the average GDP of Europe is \(0\) dollars, the average life expectancy in Europe is \(53.4\) years.

  • Slope: For each additional \(1\) dollar increase in average GDP in Europe, the life expectancy of a person in Europe will increase by \(0.000296\) years.

\[ \hat{y} = 56 - 0.0000803x \]

  • Intercept: When the average GDP of South America is \(0\) dollars, the average life expectancy in South America is \(56\) years.

  • Slope: For each additional \(1\) dollar increase in average GDP in South America, the life expectancy of a person in South America will decrease by \(0.0000803\) years.

\[ \hat{y} = 48.1 + 0.000591x \]

  • Intercept: When the average GDP of North America is \(0\) dollars, the average life expectancy in North America is \(48.1\) years.

  • Slope: For each additional \(1\) dollar increase in average GDP in North America, the life expectancy of a person in North America will increase by \(0.000591\) years.

\[ \hat{y} = 47.2 + 0.000724x \]

  • Intercept: When the average GDP of Oceania is \(0\) dollars, the average life expectancy in Oceania is \(47.2\) years.

  • Slope: For each additional \(1\) dollar increase in average GDP in Oceania, the life expectancy of a person in Oceania will increase by \(0.000724\) years.

2.3 Model Fit

Variances
Response Fitted Values Residuals
459.75 24.31 435.43